Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46129: Make collections.query_info a single HTTP call #1074

Merged
merged 8 commits into from
Sep 11, 2024

Conversation

dhirving
Copy link
Contributor

@dhirving dhirving commented Sep 9, 2024

Previously RemoteButlerCollections.query_info would make an HTTP call for each resolved collection. It now does everything in a single call to the server instead.

RemoteButlerCollections.get_info and other collection methods in RemoteButlerRegistry have also been updated to re-use the new endpoint.

Also fixed an issue where the summary_datasets parameter to DirectButlerCollections.query_info was not doing anything.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

Copy link

codecov bot commented Sep 9, 2024

Codecov Report

Attention: Patch coverage is 95.52239% with 6 lines in your changes missing coverage. Please review.

Project coverage is 89.68%. Comparing base (5a6ff82) to head (af34091).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...butler/remote_butler/_remote_butler_collections.py 89.28% 1 Missing and 2 partials ⚠️
python/lsst/daf/butler/_dataset_type.py 66.66% 1 Missing and 1 partial ⚠️
...on/lsst/daf/butler/remote_butler/_remote_butler.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1074      +/-   ##
==========================================
+ Coverage   89.66%   89.68%   +0.01%     
==========================================
  Files         359      360       +1     
  Lines       46935    46988      +53     
  Branches     9652     9653       +1     
==========================================
+ Hits        42086    42139      +53     
+ Misses       3481     3480       -1     
- Partials     1368     1369       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andy-slac
Copy link
Contributor

DM-45993 was merged, please rebase and add forwarding of new query_info parameters to remote server.

Previously collections.query_info would make an HTTP call for each resolved collection.  It now does everything in a single call to the server instead.
There is no reason to add a duplicate API for this, because get_info is identical to calling query_info with a single collection name.
The various Butler sub-objects (registry, collections) need to access the registry defaults.  Added a small helper class so that they don't need a reference to RemoteButler or each other in order to fetch the defaults.
Now that there is an independent implementation of ButlerCollections for RemoteButler, forward the equivalent Registry methods to it to reduce duplication.
Update RemoteButlerCollections to pass the new optimization parameters added in DM-45993 to the server.
When attempting to add a test for the summary_datasets parameter to ButlerCollectionInfo for RemoteButler, it turned out that it does not doing anything in the DirectButler version.

This was occurring because a caching context causes this parameter to be ignored, and the cache was always enabled in query_info previously.  This cache does not have a benefit for the new implementation.
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for adding a unit test for new query_info parameters! Maybe we should try to simplify things by passing dataset type names instead of dataset types, but that is up to you.

@@ -796,3 +796,21 @@ def _unpickle_via_factory(factory: Callable, args: Any, kwargs: Any) -> DatasetT
arguments as well as positional arguments.
"""
return factory(*args, **kwargs)


def get_dataset_type_name(datasetTypeOrName: DatasetType | str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it a classmethod of DatasetType to avoid extra imports?

Comment on lines 14 to 15
from ...._dataset_ref import DatasetId, DatasetIdGenEnum, DatasetRef, DatasetType
from ...._dataset_type import get_dataset_type_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatasetType above should also be coming from _dataset_type.

@@ -280,7 +283,7 @@ def query_info(
include_parents: bool = False,
include_summary: bool = False,
include_doc: bool = False,
summary_datasets: Iterable[DatasetType] | None = None,
summary_datasets: Iterable[DatasetType] | Iterable[str] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow Iterable[str] only to avoid all complications? I think that clients of this method that actually pass summary_datasets should already have a list of names for other purposes. I guess this also means that datasets manager has to accept Iterable[str] instead of Iterable[DatasetType]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same thing, but the convention in the rest of the new query system is that dataset types can be specified as either the name or DatasetType instance so it makes sense to follow that. It's also common for this parameter to come from the result of a queryDatasetTypes call.

datasetTypeOrName : `DatasetType` | `str`
A DatasetType, or the name of a DatasetType. This union is a common
parameter in many `Butler` methods.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring also needs Returns.

@dhirving dhirving merged commit fb8c666 into main Sep 11, 2024
18 checks passed
@dhirving dhirving deleted the tickets/DM-46129 branch September 11, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants